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Abstract. In the first part of this paper we give an elementary proof of the fact that if an infinite matrix 
A, which is invcrtible as a bounded operator on can be uniformly approximated by banded matrices then 
so can the inverse of A. We give explicit formulas for the banded approximations of A~^ as well as bounds 
on their accuracy and speed of convergence in terms of their band-width. In the second part we apply these 
results to covariance matrices S of Gaussian processes and study mixing and beta mixing of processes in terms 
of properties of S. Finally, we note some applications of our results to statistics. 
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^ ! 1 Introduction 

i Let I be either the set of natural numbers, N, or the integers, Z, and let €^ denote the corresponding 

\ Hilbert space of one- or two-sided infinite sequences {uk)kei of complex numbers with X^feei [""fcP < 

. i We know that every bounded linear operator A on t"^ can naturally be identified with a (one- or two-sided) 

I infinite matrix {aij)ij£x- We will therefore use the words 'operator' and 'matrix' synonymously here. 

I It is clear that the inverse of a banded infinite matrix A, if it exists, is in general not a banded matrix 

any more. However, one can show that it is still the uniform limit of a sequence of such, that is: it is what 
we call a band-dominated matrix. By a simple approximation argument, this result immediately implies 
. - that the class of band-dominated matrices is inverse closed, that means: if one of them is invertible, its 

rS ' inverse is again band-dominated. 

H ; 

CZ ^ Outline of the Paper. We will give a proof of this result in Section 2 which moreover comes with 

explicit formulas for the banded approximations of the inverse and with bounds on their accuracy and 
speed of convergence in terms of their band- width. In Section 3 wc apply these formulas to another class 
of operators, the so-called Wiener algebra. These inverse closedness results themselves are not new (see 
e.g. Kurbatov [9, 10, 11] but also [12, 13, 16, 19, 20, 17, 14] for related questions) but what we believe is 
new here is our approach and the explicit approximates of the inverse that it comes with, as well as the 
generalizations of our results to the operator classes defined in Section 4 with an eye to applications in 
statistics. 

Next, in Section 5, we study the relation between our results in Sections 2 and 3 and the notion of 
regularity in Gaussian processes, a well settled problem in the stationary case, clarified, as we believe, for 
the first time in the general (non-stationary) case. In the same section, we consider the characterization 
of the notion of beta mixing (or absolute regularity) for Gaussian processes, a problem considered by 
Ibragimov and Solev [6] for the stationary case. We give a necessary and sufficient condition for beta 
mixing in the general case, relating it to closure notions stronger than those in the previous sections. 
Finally, in Section 6, we sketch the applications to statistics which initially prompted this work. 
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Notat ions. Here and in what follows, let : — ^"^(2^) stand for the set of all sequences x = {xk)k£X 
of complex numbers with 




where the index set X is fixed. For every bounded and linear operator A on let A* denote the adjoint 
operator with matrix representation (aij)* = {aji) and let ||A|| denote the induced operator norm, that 
is 

\\A\\ = sup 

xeP\{o} \m\ 

By putting I = {1, ...,n} in the above, we define |ja;|| and as is usual for finite vectors x G C" and 
n X n matrices A acting on them. We will now suppose that I E {Z,N}. 

Let BL := BL(i!^) be the set of all bounded linear operators A : i'^. Equipped with addition, 

multiplication by scalars, operator composition and the above norm, BL is a Banach algebra and even 
a C*-algebra with involution A i-^ A* . li A G BL is invertible (i.e. a bijcction P —J- f^) then also 
A~^ G BL as a consequence of the open mapping theorem. Now let BO := B0(€^) refer to the set of 
all operators A g BL that are induced by a banded matrix - meaning a matrix with only finitely many 
nonzero diagonals. Clearly, the set BO is closed under addition, multiplication, multiplication by scalars 
and under passing to the adjoint - but it is not closed in the operator norm || • |j on BL. That is why one 
is interested in the norm closure of BO, henceforth denoted by BDO := BDO(^^), the elements of which 
are called band-dominated operators/matrices. 

From a computational point of view the operator norm || • || is not very handy. An alternative norm 
I ■ ] can be defined on BO as follows: For A G BO with matrix representation {aij)ij^x and for each 
fc G Z, let dk be the supremum norm of the fc-th diagonal of A, that is 

dk ■■= sup{|ay| : i,jGl, i-j^k}, and put [A] := ^dfc. (1) 

It is easy to sec that this defines a norm on BO with ||A|| < |A] for all A G BO. Let us this time 
pass to the completion of BO in the stronger norm | • |; what we get is a proper subset of BDO that 
shall be denoted by W. Equivalently, A G W iff |A| < oo, where fA} is as defined in (1) but now for 
arbitrary infinite matrices. It turns out that (W, | • ]]) is a Banach algebra that is often referred to as the 
Wiener algebra. If we, for a moment, generalize our setting from operators on i'^ to operators on with 
p G [l,oo], it is clear that the class BO does not depend on p. Unlike the norm closure BDO(£'') of BO, 
the Wiener algebra W is also independent of p since it is defined merely in terms of matrix entries. One 
has, for all p G [1, oo], that 

W C BDO(F) with IIAjl < fAj (2) 

for all A G W. We will give an elementary proof of the inverse closedness of W, thereby automatically 
proving Wiener's famous theorem on functions with absolutely summablc Fourier series. 



2 BDO is inverse closed 

The shortest proof of the inverse closedness of BDO goes like this: By its definition. BDO is a Banach 
subalgebra of BL that is closed under the involution map A i-^ A* . Since, moreover, the equality 
\\A*A\\ = ll^lp holds for all A G BL, both BL and BDO are C*-algcbras, and a basic theorem [4] on 
C*-algebras says that therefore BDO is inverse closed in BL, i.e. if A G BDO is invertible in BL one 
always has A~^ G BDO. In this section we will find out how to approximate A^^ by band matrices - 
and how good this approximation is. 

In order to distinguish between banded matrices of different band widths fc G No := N U {0}, we 
will introduce the notation BOfe for the set of all A G BL whose matrix {aij)ijei is supported on the 
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diagonals numbered —k,...,k only, that means aij = if |i — j| > k, i.e. d„ = if |n| > k with dn from 
(1). Clearly, we have BO^ C BOfe+i, BO = Ufc>oBOfc and 

OO 

^eBDO iff = dist(^,BO) = dist(A, I J BO^) = lim dist(A, BOfe) (3) 

fe=0 

with the usual definition of the distance, dist(^, 5) inf^gs — B\\, of an operator A e BL from a 
set S C BL. Note that if aij is a matrix entry of A with | i — j | > k then clearly is still a matrix entry 
oi A - B for all B g BOfc so that ||^ - B\\ > \aij\. Consequently, 

dist(A,BOfc) = inf \\A-B\\ > |ay|, \i ~ j\ > k 

holds, i.e. dist(A, BOfc) is a bound on all matrix entries outside the —k,...,k band of A. Using the 
diagonal suprema introduced in (1), we can rephrase this as 

dist(^, BOfc) > sup{|aij| : z,jel, i — j = n} = dn, |n| > fc. (4) 

We start with the simple case when A is banded and self-adjoint positive definite, i.e. A e BO, A ~ A* 
and the spectrum of A, sp A, is strictly positive. In this case, it is well known that 

M sup |A| = giA) = \\Al m := inf |A| ^ l/giA-') = l/\\A-'\\ 

XespA A6spA 

with q{A) denoting the spectral radius of A. Moreover, we have that k := M/m = || A|| is the 
condition number of A. 

Lemma 2.1 Let A G BO^ for some A: G No be self-adjoint positive definite, and define M,m and k as 
above. Then, for every n eNq, it holds that 

-, / 71 r \ n+l -, / -I \ n+l 



dist A-i,BO„.fc) < - — -— = _ (5) 

where an approximation Bn G BO„.fe of A^^ with this accuracy is given in (6) below. In particular, 
A^^ G BDO since the right-hand side of (5) goes to zero as n ^ oo. 

Proof. We start by looking for a 7 G M such that 

II/-7AII = g{I~jA) = sup I7A-II = max(|7m- 1| , I7M- 1|) 

Aesp A 

is minimized. A little thought shows that this is the case iff \'ym — 1| = I'jM — 1|, i.e. 1 is the midpoint 
of the interval [7m, 7M] so that 7 = jj^-p;^- In this case, 

2771 M-m K-1 2 
\\I ~ jA\\ = 1 - 7m = 1 = = = 1 < 1. 

" " M + 771 M + m K+l K+l 

Now, by Neumann series, for every n G Nqi 

00 n 00 

= 7(7^)-' = jY.(^-jAy = ^Y.(^-^Ay + 7 E (i-i^y (6) 

^ V ' V ' 

= .B„ =:C„ 

holds with Bn G BOn^ and 

00 n ( MjzHL] ^ 1 / B r \ n+l i / i \ " + 1 

11^ II ^ I I llr .117 2 \M+mJ I (M-7n\^ 1 ^ . ^ 

\\Cn\\ < i7l E ll^-^^ll^ - wT^^-ihr = 7;;{wT7;;) - ttt ' 



M + 777. 1 - -^^ 

j=n+l M+m 



which finishes the proof. 
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We now pass to the non-self-adjoint case - but still banded. 



Proposition 2.2 Let A E BO^ for some k G Nq be invertible and again put M := \\A\\. m := 1/\\A ^\\ 
and K :~ M/m = || A|| || . Then, for every n £ Nq, it holds that 

dxst(A-, B03„o < [jpT^) - M 

where an approximation of A^^ in BOsnfc with this accuracy is given in (9) below. In particular, A~^ S 
BDO since the right-hand side of (8) goes to zero as n —> oo. 

Proof. The idea is to write A^^ = B^^A*, where B := A* A G B02fe is clearly self-adjoint positive 
definite, and to approximate B^^ as in the previous lemma. When we apply Lemma 2.1 to B (in place 
of A) , note that 

Mb ■■= \\B\\ = \\A*A\\ = \\Af = M^, 

ruB ll\\B-^\\ = = l/\\A-\A-^)*\\ = l/\\A-^f = and 

Kb AlB/riiB = M'^jrr? = k^. 

Now for every n E Nq, in analogy to (6), we can write B^^ = Bn + C„ with Bn G B02nfe and ||C„|| 
bounded as in (7), so that A-^ = B-^A* = BnA* + CnA* with 

BnA* = ^Bj^i^-IBBYA* = I 2 j2il- ,iZ^2 y^* e B03„fc (9) 



and 

" " " - mB\MB+7nBj m^\A'P + my M\k^ + 1^ 

which proves the result. ■ 

Finally, we pass to the most general case, A g BDO. 

Theorem 2.3 Let A E BDO be invertible, put 5k '■= dist(A,BOfc) for k = 0,1,2,... and again let 
M :== ll^ll, m := l/||A"ij| and k := M/m = as well as ak m/(m - 25k)- Note that, by 

(3), 5k — > and afe — > 1 as k ^ oo. Then, for all n G No and all sufficiently large k G No (such that 
5k <m/2), it holds that 

dist(^ ,B03„.) < ^\^25k + MM + 25k)[^ ^^,^^^^^^,^^^^^^^, ^ 
^ 26ka^ (n+r ({4? ^ ^ 



M + 25k + l 

with 4 defined by (11), respectively. In particular, A^^ G BDO. 

Proof. For every fc G No pick an Ak G BOfe with \\A ~ Ak\\ < 25k. Then Ak^A (norm convergence in 
BL) as fc — > oo since 5k — > 0. Since A is invertible we know that Ak is invertible for sufficiently large fc; 
precisely, take fco G No big enough that 5k < m/2 for all fc > fco so that ||^~^(A — Ak)\\ < 25k/m < 1. 
Then Ak = A{I - A-'^{A - Ak)) is invertible with 

Pfc'll = \\J2(^-\A-Ak))=A-^ < E(^) il^^'ll = T-Wil^"'ll = "'^■ll^^'ll 
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and hence 

\\A-^'-A-'\\ = \\A-\A^A,)A-'\\ < \\A-^\\A-A4\\A-'\\ < -25,^ = ^ (10) 

for all fc > fco so that also A'^^^^A^^ as fc — > oo. At this point it is clear that A^^ e BDO since 
AjT^ G BDO by Proposition 2.2 and since BDO is closed. 

For an explicit approximation of A~^ by banded matrices, it remains to look at banded approximations 
of AjT^ and to use (10). Therefore, for every k > ko, let b'"^\ b'^\ ... be the banded approximations of 
j4j^^ from Proposition 2.2. where bIi^ G BOsnfe and \\A'f.^ — bIi'^\\ is bounded by the right-hand side of 
(8) with M,m and k replaced by Mk := nifc := 1/11^^^11 and := Alk/nik, respectively. Finally, 

put 

:= = ak\K± (11) 

for every k e Nq. From \\A - Ak\\ < 2Sk and \\A'^:^\\ < ak\\A-^\\ we get M - 24 < Ah < M + 24 and 
ruk > m/ak- Hence 

Mk ( Ml -ml\ ^ M + 26k ( (M + 25k? - [mjakf ^ 



^ " - ml\Ml + mlJ - (m/afe)2 V(M-24)2 + (m/a,)^ 



M + 24 V(«.T)2 + i; M + 24 \al{n^'-tf + lj 

Bounding \\A-'^ - bH'^W < \\A-'^ - A^^W + WA^^ - B^I'^W by (10)+(12) completes the proof. 



n+l 

I • (12) 



3 The Wiener algebra W is inverse closed 



The term 'Wiener algebra' is commonly used for the set M^(T) of all functions f{t) = J2nez /"^" '^^ 
unit circle T whose sequence of Fourier coefficients / = {fn)n&. is in £^ (two-sided infinite), equipped 
with pointwise addition and multiplication and with the norm \f\ := ^ |/„|. Norbert Wiener's famous 
theorem says that if / G Vl^(T) is invertible as a continuous function, i.e. / vanishes nowhere on T, then 
/^^ = 1// is in VF(T) as well, showing that M^(T) is inverse closed. 

In fact. Wiener's theorem is a special case of our Theorem 3.1 saying that 'our' Wiener algebra W is 
inverse closed! It follows if we apply Theorem 3.1 to a two-sided infinite matrix with constant diagonals. 
To see this take I = Z and associate with every function / G Ty(T) the so-called Laurent matrix 



I 



fo /-I /-2 

/i /o /-I 

/2 /l fo 



V 



(13) 



which is sometimes also (wrongly) called a 'two-sided infinite Toeplitz matrix'. Then, clearly, 

Lif) e W and [L(/)] = ^ |/„| = ffj. 



Now for g G L^{T), let g = {gn)n&z G £^ denote its sequence of Fourier coefficients and note that 
L{f)9 = f * 9 = fg acts as the operator of convolution by the sequence / = (/n)- In other words, if 
F : L'^{T) — > i'^ is the Fourier transform g ^ g = [gn) then 

L{f) = FM{f)F~^ 
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with M{f) : L'^{T) L'^{T) denoting the multiplication operator g i-^ fg. The latter formula shows 
that L{f) is invertible on iff M{f) is invertible on L^(T) which is clearly the case iff / has no zeros 
on T. In this case = FM(/-i)F-i = L{f-^) holds, and Wiener's statement, /"^ e W{T), is 

equivalent to the inverse Laurent matrix {L{f))^^ = L{f^^) being in W. Our Theorem 3.1 however says 
much more: For all matrices A £ W, not just for those with constant diagonals, the inverse A^^, if it 
exists, is in W. 

Theorem 3.1 If AeW is invertible then A^'^ e W. 

Proof, a) As in §2, we start with the case A G BO, say A e BOfe for some k e Nq. Then, by (8), 

dist(A-,B03„.) < r„ ^ j 

for every n e No, where M = \\A\\, m = and n = M/m = \\A\\\\A^''-\\. Now, for every j e Z, let 

dj denote the supremum norm of the j-th diagonal of A^^. By the previous inequality and (4) we get 
that 

dj<ro for j G {±1, ±3fc}, 

dj < ri for j e {±(3fc + 1), ±6fc}, 

dj < r2 for j e {±(6fc + 1), ±9k}, 

Summing up we have 

lA-^j = < do + 6kro + 6kri+6kr2 + ... = do + 6A:(ro + n + ra + ...) < cx3 

since r„ decays exponentially. So A^^ E W ii A £ BO. 

b) Now let A e W be invertible and take ^1,^2, ... G BO such that {A — Aij — > as i 00. Since 
(W, I'D is a Banach algebra we know that for sufficiently large i also Ai is invertible and — A^^] — > 
as i 00. Together with part a) and the closedness of (W, [ • ]]) this proves the theorem. ■ 

Note that one corollary of Theorem 3.1 is that if, for some fixed p E [l,oo], an operator A : £p P 
with matrix representation in W is invertible then its inverse is again given by a matrix in W and A~^ 
therefore (see (2)) acts boundedly on all spaces £p with p € [1, 00]. So invertibility and spectrum of such 
operators A do not depend on the particular choice of p. In [15] it is shown how this result can be used to 
prove that even the property of being a Fredholm operator (including the value of the Fredholm index) 
and hence the essential spectrum of A does not depend on p g [1, 00] if A G W. 

4 Some generalizations 
4.1 Generalized banding 

It is easy to see that our results generalize well beyond BDO and W. To see what we mean, let p be a 
metric on I and define the set BO^ of ^-generalized banded operators as the set of all A — {aij)ij^x G BL 
with aij = for all i,j G X with g{i,j) > k, for some fixed k. It is easy to see that BO^ is also closed 
under addition and multiplication and taking adjoints. Examples of interesting metrics g other than 
oihi) = \i ~ i\ E^rs obtained by taking a sequence (xi)igi of pairwise different elements from another 
metric space (X, d) (e.g. X = R" with the Euclidean norm) and putting 

Q{h3) d(x,,Xj), i,i G I. 

These generalizations are interesting in statistics as we shall see in Section 6. If we let BDO^ be the 
closure of BO^ in BL(£^), Lemma 2.1, Proposition 2.2 and Theorem 2.3 go over verbatim to BO^ and 
BDO^. 
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If we now define tlie generalized Wiener norm by 

fcGZ 

where dj.^^ — sup{|aij| : i,j S I, oihi) = we can obtain an exact generalization of Theorem 3.1. An 
obvious application of this result is the generalization of Wiener's theorem to analytic functions 

/ : t ^ E /"t" 

neZP 

of several variables, where t = (ti, . . . , tp) S V , n — (rii, . . . , Up) G Z^" and t" = Jlj^i ^J' ■ 
4.2 Banding up to a permutation 

Let TT be a permutation, that is, a 1-1 mapping of I onto itself which is clearly representable as an 
operator {xi)i^x ^ (a;7r(i))iei of norm 1 from £^ to which we also denote by tt. Define 

BO^''^ = {A : ttAtt* e BO}, 

where tt* is the adjoint /inverse of tt. BO''^' is closed under addition and multiplication but not under 
taking adjoints since A e BO^'') =^ A* e BO^'^'^. So we can't expect a generalization of Theorem 2.3. 
We can, however, generalize Lemma 2.1 and obtain a special case of Theorem 2.3. Specifically, let 

HBO^''' := {A e B0^'"'> : A is hermitian positive definite} 

and HBDO^''' be its closure. Then Theorem 2.3 generalizes to HBDO^''' if we replace B^^l{ by bI^J (with 
obvious notation changes). 

We can also in an obvious way obtain the same conclusions for generalized banding. More important 
from a statistical point of view is the following generalization. Let 

jjgQPERM ._ IJhBO*'^). 

TT 

Then HBO'"'^''" is only closed for addition, scalar multiplication, and taking powers. However, an ex- 
amination of the proof of Theorem 2.3 shows that these properties are sufficient to arrive at the same 
generalization for HBO'"'"'^" and its closure HBDO'''"'^"' as we did for HBO'-'^-' and its closure. Again, ev- 
erything carries over verbatim to generalized banding. These results, particularly the last, are of interest 
in statistics since, for reasons to become apparent, it is desirable to define classes of covariance matrices 
such that matrices and their inverses necessarily obey the same definition of sparseness. 



5 Applications to probability theory 

5.1 The closures of banded self-adjoint positive definite operators and Gaus- 
sian processes 

A Gaussian process is a sequence of random variables, {Xj : j G Z}. on a probability space whose finite 
dimensional joint distributions are Gaussian. Without loss of generality, we take EXj = for all j, so 
that the joint distributions are determined by the matrices 

E™,„ := EXZ,[X^f - [EX.Xj]l^^^, m, n S Z, 

where we put X"^ := {Xm, ■ ■ ■ , XnY' . We extend these notations to m = —oo and ?^ = cx) by introducing 
the two-sided infinite vector X'^^^ :=(..., ATq, A"i, . . .)-^ and matrix 

V Tp^oo \~sj-oo iT rz?V V loo 
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For the rest of our discussion wc assume that the infinite matrix E acts as a bounded operator from P' 
to €^ . A regular process is one for which 

oo 

EeB-o. ■■= n ^-oo =^ F(i?)e{0,l}, 

i— — oo 

where Bl\ is the cr field generated by X"^ or equivalently 

hm sup||PMB)-PM)P(B)| : B Bl^] =0 

for aU A G S-oo and all p. A strongly mixing process is one for which 

lim (3{p) = 0, where /3(p) = sup - P{A)P{B)\ : A e 6™^, B e 6^, m e Z| . 

p—^oo ^ ^ ^ 

Let Pm,n be the joint distribution of XJJ,, the probability measure induced on B^ by P. Let P™^" be 
the regular conditional probability measure on S^+p given B™^. If H/^Htv denotes the total variation of 
a signed measure p, let 

A beta mixing (or absolutely regular) process is one such that 

lim sup/3(m,p) = 0. 

p-i.oo TO 

As noted in [5], beta mixing implies strong mixing. The converse is not true as Example 5.3 below shows. 
On the other hand, let Qm,n be the probability distribution induced by P on the a field generated by 
B™rx, and B^. Let Q^n t>e the product probability on the same a field with marginals P-oo,m, Pn.oo- 
Then Lemma 2 of [5, p. 118] states 

= \\\Qm,',n+p-Q„^,,n+p\\TV■ (14) 

A mean process is linearly regular if 

E{X.p+„i+i\X'^oo) ^ 

as p — > oo, uniformly in to. A detailed discussion of these concepts is in Ibragimov and Rozanov [5] 
primarily in the context of stationary processes. 

For Gaussian processes, linear regularity and regularity are equivalent, see [5, p. 112]. We connect with 
our previous results via 

Theorem 5.1 //S has a bounded inverse and belongs to BDO then X^^^^ is regular, and so is the process 
corresponding to S^^. 

Proof. RecaU X^ = {Xa,Xa+i, Xtf and let 

E(a,6) = EX^M^ = [EX,X,]l^^ 
denote the {b — a + 1) x {b — a + 1) diagonal block of E. Moreover let 

o-(a,6,c) = SX^Xc . 

Then 

E{X„,+p+i\X"_J = [X'!J^E-i(-TO,nV(-TO,7i,n + p+l) 
E[E{Xn+p+i\X":,J]'^ = cr^(-TO,?i,n + p+ l)E"i(-TO,n)cr(-TO,7i,n + p+ 1) . (15) 
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Since E is a member of BDO we can find banded of width fc(e) such that \\B^ — < e. Let 
. . . , X„_|_p-^i(£), . . . ) be a Gaussian process with covariance operator B^, (a moving average 
process). Then, 

||Be(-m,n) - E(-TO,ri)|l < e 
|crg(— TO, n, n + p + 1) — (T(—m, n, 71 +p + 1)1 < e, (16) 

where <T^{a,b,c) = E'K^Xde) and | • | is the Euchdean norm. By construction, 

ae{-m,n,n+p+l) = if p > k{£) . (17) 

Then, from (16) and (17), 

|cr^(-TO,ri, n + l)E"^(-m, n)cr(-TO,n, n + 1)1 < £j|I]"^||e (18) 

for ah p > k{e). The main resuh fohows from (15) and (18). The corresponding statement for is a 
consequence of Theorem 2.3. ■ 

5.2 Beta mixing and Probenius closure of banded operators 

In general, E G BDO having a bounded inverse does not imply beta mixing (see Example 5.3 below). 
But below we prove that beta mixing is equivalent to a condition on the off-diagonal decay of E which 
can be related to the closure of BO in a type of Frobenius norm. 

In the negative, there are results of Kolmogorov and Rozanov [8] for symmetric positive definite 
Toeplitz matrices E, i.e. one-sided infinite versions of (13) with /„ = f-n, showing that strong (and 
hence beta) mixing does not hold if the associated symbol / : i e T n- J2nez ^ ^ discontinuities 
of the first kind. (Recah from Section 3 that E is: bounded iff / G L°°(T), invertible iff 1// S i°°(T), 
and it is in BDO iff / is continuous.) 

On the positive side, Ibragimov and Rozanov [5, p. 129] establish 

Theorem 5.2 // / is the symbol of a (one- or two-sided infinite) Toeplitz matrix E corresponding to a 
stationary Gaussian process X and 

/(e") = |P(e'-)pa(x), xe(-7r,^), 

where P is a polynomial with zeros, if any, only on the unit circle and loga(-) belongs to the Sobolev space 
14^2 '2 ffieri X is beta mixing and conversely. 

Recall that 

W^-^ = |6(x) = ^a,e"=- : ^ |fcp^ la^p < c» 1 . 

I k£Z fcez J 

Note that if / is bounded above and away from zero then one can take P = I, and the condition 
loga(-) € M^2'2 is equivalent to a G W^''^ and a bounded away from zero. To see the latter, note that 
VF2'2 Qa,n be equivalently characterized by the Sobolev-Slobodeckij norm (e.g. [18]), in which it becomes 
clear that with / also powers of / and hence, by closedness, also log / (if / is bounded above and away 
from zero) and exp / are in W^'^. 

Here is an example of a strong but not beta mixing stationary process. 

Example 5.3 For k e Z, let au = l/v^N if 1^1 ^ {1"^, 2^, 3'', ...}, and afc = otherwise. Now look at the 
symbol function f{t) = X^fcez'^'^^'" defined on the unit circle T. Because of Ok = a-fc, the symbol / is 
real-valued. One moreover has 

Vto^ 'm 3 

fcez msN ^ men 
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so that / is in the Wiener class Ty(T) that we discuss at the beginning of Section 3, and hence / is 
continuous. In particular, / is bounded with 



feez 



fcez 

so that f{t) <E (~^, ^) C (—4,4) for all f e T. However, / is not in the Sobolev space W^'^ since 

Putting := f{t) + 4, we get that the Fourier coefficients bk of g coincide with a^, except bo which is 
4. Now the range of is in (4 — 4 + ^) C (0, 8), so that g is positive, bounded away from zero, and 
continuous, whence the associated process with covariance matrix E = {bi-j)ij is strong mixing. But 
the process is not beta mixing since g ^ W^'"^. □ 

We now give a generalization of Theorem 5.2 to arbitrary bounded covariance matrices E. Implicitly, 
the result is essentially in Lemmas 2-5 of [5, §IV.4] but we give a full statement and proof here for 
completeness. We denote the entries of our infinite covariance matrix E by = EXiXj for i,j E Z. 

Theorem 5.4 Suppose E G BL is invertible and E'K'^^ = 0. Then X'^^^ Gaussian is beta mixing iff 

n oo 

sup ^ ^ afj as p -> cx). (19) 

t— — oo j—p-\-n 

A simpler sufficient condition is given by 

oo oo 

7(p) < CX3 for some p > 1, where jip) := ^ afj, (20) 



2— — oo j—p+i 



since (20) ^ (19). 



Proof. Note that with E also E~^ is in BL and put M := max(||E||, ||E ^||) < oo. By (14), to prove the 
theorem we need only show that (19) is equivalent to 

sup ||Q„i,„j+p - (3„, ,„+p||ry as p^oo. (21) 

m 

Moreover, it is easy to see that (21) is equivalent to 

sup \\Qm,n,p,k~ Qm,n,p,k\\TV ^0 aS p ^ OO , (22) 

where Qm,n,p,k and Qm,n,p,k are the distributions Qm.m+p and Qm.m+p restricted to B'^'\_^'^^ . 

We consider X(i) := (X„, . . . ,X„)^, X^^) {X^+p+i, ■ • . ^Xn+p+kY"- Let / denote the joint density 
of (X^^^X^^)) so that / corresponds to 



S = 



Sll Ei2 
^21 E22 



the covariance matrix of (X'^^^X^^^) blocked out. Thus, 

X(i) ~ iV„_,,„+i(0, En) and X2 ~ iV,(0, E22). 

Let g correspond to 

^ _ / Ell 
^''-[ E22 
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the covariancc matrix of (X'^^',X(^)) if X'^^ and X*^^^ are independent. Let Pf,Pg be the probabihty 
distributions of X^^' and X^^^ and 

\\Pf-Pa\\TV \j\f-9\ 
be the variational norm. We suppress the dependence of /, 5 on m,p, k in what follows. Let 



2 1 



2{l -A{f,g)) 



be the (squared) Hellinger metric. It is well known that 

\H\Pf,P,) < \\Pf~Pg\\TV < V2H\Pf,Pg). 

Thus we can replace \\Qm.n,p.k - Qm,n,p,k\\TV by H'^iQm.n^pM, Qm,n,p,k) in (22). 

Our argument will consist of bounding ^H^ = 1 — A above and below by functions a„j „ ^ ^. and am,n,p,i 
of {(T? : m < i < n,n + p + 1 < j < n + p + k} (sec (30) below) such that 



sup a^^ri,p,k 



p — > 00 



iff (19) holds. To do this, we have to compute A{f,g). 

Note that || • \\tv and H^{-, •) are invariant under regular linear transformations X^^^ — > TiX'^^ and 
X(2) TaX^^). For the choice of these matrices Ti and T2, suppose, by the spectral theorem, that 
Yijj = QJ AjQj, j ~ 1, 2, where Qj are orthogonal and Aj are diagonal, and put 



and 



T2 = A2 ^02. 



Replacing (X(i),X(2)) by (TiX^^), TaX^^)) in the above, we get that Sn = h and S22 = h are the 
(n — m + 1) X (n — m + 1) and k x k identity, respectively. We shall establish the theorem in this case 
and then derive the general case. 

If Ell = Ii and E22 — I2 then, in corresponding block notation. 



with 



5.22 
5.12 
5.21 



E" Ei2 

E21 E22 



(II — E12E21) ^ 
{h — E21E12) ^ 

^{II ^ E12E21) ^Ei2 
— {I2 — E21E12) ^E21 



and the determinant of S is equal to 

IS"! = |/i — E12E21I = 1/2 — E21E12I 

since E12E21 and E21E12 have the same nonzero eigenvalues. It holds that 



(23) 



(24) 



(/5)^ = \Sr-^ (2^)- 
X R'^ and 



\s\- 



(25) 



5- 



E"+/i Ei2 

E21 E22 + /2 

(^1 — Ei2E2i)^"'" + h —{h — Ei2E2i)^"'"Ei2 

— (/2 — E21E12) "'^E21 (/2 — E2lEi2)~"'^ + /2 



Mil M12 

A/21 A/22 



(26) 
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To compute (25), recall (24), (26) and the standard formula for block determinants 

\S-' + S^'\ = \Mn\ \M22 - Af2iM,-iA/i2|. 



(27) 



W.l.o.g suppose n — m + 1 < k. Now let Am, • ■ • , A„ be the eigenvalues of I]i2S2i- Then i;2iSi2 has the 
same n — m + 1 eigenvalues and the rest are zeros. Let x be an eigenvector of E21E12 corresponding to 
A. Then S12X is an eigenvector of S12S21 corresponding to the same A. Consequently, for such an x, we 
have 



M22X 



((/2-S2lSi2)"'+/2).T = i{l-\)-^+l)x = {l-X)-\2-X)x 



and 



M2lMn^Mi2X = {I2 - E2lEi2)-ll]2l((Il - ^12^21)-^ + /l)-'(/l - Ei2S2l)-lSi2X 

= {I2 - S2iEi2)-^E2i((l - A)-i + - A)-iEi2X 

= (l-A)-i((l-A)-i + l)-i(l-A)-iAx 
= {1- X)-\2- xy^xx. 

Taking this together with (26) and (27), we get 

fe 



2(S"'+So"') 



n5Mr+i)ni(T 



1 / 2-A,- 



A, 



SO that, by (24), (25) and (28), 



A 1 2-A^ (2~A,)^-A, 
4 1-A, (1-A,)(2-A,) 



A, (1-A,)(2-A,) 

^ fj (4-A,)(l-A,) 
it 4(1-A,)2 



Hi 



1 - 



,(28) 



Aif,g) = \sn 



1(8-' + So') 



\J=m 



j=m \^ A )■ 



where we recall that < Aj < 1 for all j since E12S21 is positive semi-definite and (E^^) ^ = /i — E12E21 
is positive definite. 

Now we put 

n 

t := Trace(Ei2E2i) = ^ A^ 
and note that our condition (19) is equivalent to 

n 

t = Trace(Ei2E2i) = ^ 

i—m 

uniformly in m, n, k. The inequalities 

n n 

[] (1 - A,) > 1 - ^ Aj, < A™, A„ < 1 



j"=n+p+l 



51 51 ^ as p ^ cx) 

i—ra j—n+p-\-l 



]=m 



(as can be seen by induction over the number of terms) and 



1 - A 

(T^ 



< e"2 



A > 



(which is easily checked using basic calculus), together with (29), yield 

n 

(l-t)i < X{{l~X,)-^ < A{f,g) < e-i* <1. 



(30) 
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From (30) we get that t — >■ implies A{f,g) 1. Conversely, by the right half of (30), A{f,g) -> 1 
implies t — )■ 0. 

Thus the result is proved if Sn and E22 ai'c the identity. 
General case. By the spectral theorem we noted we can find Qi and Q2 orthogonal such that 

Sii = QjAiQi and E22 - QIMQ2, 

where Ai and A2 are diagonal. The transformation of M™+'^ by ( J? I doesn't change Hellinger 

\ Q2 J 

or variational distances and sends 1— > S^j, where 

Eii^Ai, E22 = A2, S12 = QiSi2(3^, S21 = (32^21(5^ and hence E12S21 = QiSi2S2i'9r- 
But then 

Tracc(I]i2S2i) = Trace(Si2S2i) (31) 
since eigenvalues are unchanged. Sending 

X^i) ^ Aj^^QiX^i) and X'^) A2"^g2X(2) 
again doesn't change Hellinger and TV distances. If the resulting covariance matrices are T,*j then 

^11 — ^22 ~ and 1j-|^2^21 ~ -^1 ^ ^i2A2 S21 A-|^ ^ . 

Then 

n n+p+k ~2 

Trace(E^2S2i) = E E TiSw (32) 

j=m j=n+p+l -^1 ^^2 

where S12 = (S'ij) and A^^'"*, A2''' are the diagonal elements of Ai, A2. But since 

M-^ < < aJ'\a^'^ < ||S|| < M, i = m,...,n 

it follows that 

Trace(I]i2S2i) < Trace(I]^2S2i) < Trace(Si2S2i) 
by (31) and (32). Now the theorem follows. ■ 

Remark 5.5 If we suppose S to be a Laurent matrix, i.e. S — (ai-j)ij£z with at — a-k for all fc, then 
it is easy to see that (19) is equivalent to 

00 00 00 00 

E] E/ ^ ^' which evidently holds iff E] E] '^^ 

i—l k—p+i i—1 k—i-\-l 

i.e. iff 

00 

E^ fc o-k < 

This is the W^''^ condition from [5] (see Theorem 5.2). □ 

We now prove a closure under inversion result similar to those of Sections 2 and 3. Therefore, consider 
the cone of bounded self-adjoint positive definite operators E = ((Jy ) with bounded inverses and, for 
TO = 0,1,2,..., denote the subcone of all such operators with 

E 4 < °o 

|i— J I >m 
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by Fm- Define on Fm an equivalence relation by 

(aij) = (Tij) 

iff cTy- X(N - .i\ >m) = Tij x{\i - j\ > m), where 



1 if |i — il > TO, 
otherwise. 



X(|i - j| > m) = 

Let So := {(Tij x(|i — j| > ti)) correspond to such an equivalence class and define 



2 

ij- 

z— j|>m 



The quotient cone, which we again denote by Fm, is closed under convex combination and positive scaling 
but not under multiplication. However it is closed under Schur multiplication 

So * 7B := {aij x(K - .i\>m)), 

where Eg, To and Eg * To are equivalence classes. Clearly 

||So*ro||F,„ < ||Eo||F,Jiro|lF,„. 

Now note that Fq C Fi C • • • and put 



m>0 

SO that a matrix E — {<7ij) is in F iff ||Ej|F,„ < oo for some m > 0. For E G nm>oFm — -Fo, one has that 

sup||E||2,^^ = ||E|||^, = E4 



?n>0 

i,J 



is the usual Frobenius norm. 



We note the connection to BO. Evidently, BO„i C -Fm+i for all m > and hence BO C F. In fact, F 
is the closure of BO in the Frobenius metric (iF(E,T) := ||E — T||fo- To sec the latter, note that F is 
closed under dpi-, •) and take E = (cij) G F, so that E G fmo, i-C- J2\i-j\>mo ^fj ^ some too > 0. 

Then, putting Bm := ((Ty x(N ~ j\ < '^)) ^ BOm for m = 0, 1, it holds that 

dF(E,S„)2 = ||E-S„,|||.,, = ||E-S„|||-^^ = '^Ij ^0 as TO^cx). 

We will see now that, besides BDO and W (see Theorem 2.3 and 3.1), which are the closures of BO under 
II • II and I • ], respectively, also the Frobenius closure F of BO is inverse closed. 



Theorem 5.6 F is closed under inversion. 



As we have noted in Theorem 5.4, all operators in F are beta mixing so that we conclude that at least 
on F beta mixing is preserved under inversion. 

Proof. Let Y^^^^ = (. . . , Ym, . . . , F„, . . .)'^ have covariance operator E^^ and, following our previous prac- 
tice from the proof of Theorem 5.4. consider Y^^^ := (Ym, . . . , Yn)"^ and Y^^^ {Yn+p+i, . . . , Yn+p+k)'^ 
and the corresponding covariance matrices, E^^, E^^, E^^, E^^. Following our previous argument we need 
only check that 

Trace(Ei2E2i) ^ 

as p — >■ oo uniformly in to, n, k. As before we can reduce to the case where E^^ and E^^ are the identity. 
Hence, by formula (23), 

Ei2 = -(/i -Ei2E2i)-iEi2 and E^^ =. (E^^)^ = -E2i(/i - Ei2E2i)-\ 
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so that 

Tracc(Ei2j^2i) = Tracc(E2^Si2) ^ Tracc(S2i(/i - Si2S2i)"^Si2) . (33) 

Next, note that 

IIS12S21III. = ^ {Y.^i) ^ (Trace(Ei2S2i))', (34) 

where Am, . . . , A„ are the eigenvalues of I]i2S2i- But 

n n n+p+k 

Trace(I]i2S2i) ^ J2 ^ J2 4 ^ ^ (^^) 

j—rn i—rn- j— n+p+1 

as j5 — )■ 00 since S G i^n for some n. Hence, for p large enough, we can write 

(/l - Ei2E2l)-2 = X] (".^)(-Sl25^2l)'' = /l+2Ei2E21+0(||Ei2S2l|l|) . (36) 

Then, by (33) and (36), 

Trace(Ei2E2i) = Trace(Ei2E2i) + 2 Tracc(Ei2E2i)2 + 0(|lEi2E2i|l|,). 
In view of (34) and (35), the proof is complete. ■ 

6 Applications to Statistics 

This paper was motivated by the problem of estimating the covariance matrix of n independent identi- 
cally distributed p- vectors, Xi, . . . , Xjv with a common A^(0, Ep) distribution. In [1, 2] Bickel and Levina 
show that covariance matrices which are approximable by banded matrices could be well estimated, in 
the operator norm, by banded empirical covariance matrices, and accordingly their inverses could be ap- 
proximated by the inverses of the estimates above. Conversely, inverse covariance matrices approximable 
by banded matrices could be well approximated by data dependent banded matrices and now the co- 
variance matrices themselves would be approximated by the inverses of the banded matrices above. The 
bounds developed above enable us to approximate both a covariance and its inverse by banded matrices 
simultaneously in a very explicit way. 

Specifically consider Ep as the top left p x p block of a banded matrix T, : ("^ P with a bounded 
inverse E~^. Let || • || be the operator norm and put 

4(E) dist(E,BOfe) = min{||E-Bl| iSeBOfc} =: ||Bfc(E)-E||. 

with BkC^) G BOfc. Theorem 2.3, specialized to A positive definite and self-adjoint, says that 

■m[m — 20fc) m \k + I J 

where 4 = '5fc(E). ?n^^ = m~^(E) is the norm of E^^, M = A/(E) is the norm of E and k = k(E) = 
is the condition number of E. Let 



1 N 



1=1 



be the empirical covariance of Xpxi- The individual elements of Ep, (iij approach the corresponding aij 
as N 00 with high probability but Ep fails to have an inverse if p > and its eigenstructurc is, in 
general, diverging from that of E, if p is commensurate or much larger than N i.e. as p — > cx3 as well as 
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with 5-c, 0<c<oo, see Johnstone [7]. However, we can, under very mild conditions on p and N, 
find — > oo such that 

WBkA^p) - BkA^p)\\ ^ (38) 

in probabihty, where 

B,iA) := ia,,x{\^~J\<k))l,=l 

ii A = {cLijYi j^i- particular, for Gaussian X this is true if — > 0. Therefore if S e BDO, so that 
— >■ as /c — >■ cxD, this yields an operator norm consistent estimate of E, that is 

< \\Bk,{tj,) - BkA^p)\\ + \\BkA'^p)-^p\\ ^ 0. (39) 
We can deduce from (37) that if ||E^^|| < oo then 

|ii?„fe„(I];i)-E;i|l ^ (40) 

as n — > oo slowly with fc^r. It is, of course, clear from (38) and (40) that (Ep)]^^ will eventually exist, 
be self-adjoint positive definite and consistently estimate E^^. However, this is rather unsatisfactory in 
practice as well as theory since _B^^(Ep) in general does not have a band structure (meaning that it 
is supported on all its diagonals) in particular if we assume that E~^ G BO, so that by Theorem 2.3, 
E e BDO, our estimate would not reflect this information. The assumption that E~^ belongs to BO^ has, 
in the Gaussian case, a statistical interpretation. It implies that X.^ is independent of {Xj : \i — j\ > fc} 
given {Xj : \i — j\ < k, j ^ i}. The assumption that Ep G BOfe has a different interpretation, implying 
that Xi is independent (unconditionally) of {Xj : \i — j\ > k}. One of the interesting consequences of 
Theorem 2.3 is that it tells us that conditional independence (for a band structure) cannot occur unless 
there is approximate conditional independence and vice versa. That point aside, we are left with a good 
but not "natural" estimate for E~^ if we assume E^^ e BO. However, our approach to Theorem 2.3 tells 
us precisely what to do. 

Suppose i?fe(Ep) is our estimate of Ep. Let m^, Mk be the minimal and maximal absolute eigenvalues 
of i3fe(Ep) and 

7 2{Mk + mky\ 
Then an nk banded estimate of H"^ is just 

n 

7 ^(/-7i?.(Ep))^ 

j=o 

We are left with the problem of how to choose k and n. In theory, if we have some notion or make 
assumptions about the magnitude of the "bias" 5fc(E) and calculate stochastic bounds on the "variance" 
||i3fc(Ep) — _Bfc(Ep)|| and make assumptions about how many zeros E~^ has, we can use (37) to estimate 
the optimal choices of k and n. In practice, it is better to use some data determined choice, e.g. by 
crossvalidation, see Bickel, Levina [1, 2]. However, we believe that (37) and (39) can be used to compute 
minmax bounds and oracle inequalities on the performance of estimates of E" ^, from the ones obtained 
for estimates of Ep, see Cai, Zhang, Zhou [3]. It should be clear that whatever we have said of banding 
applies to generalized banding up to a permutation also. 

The application of extended generalized banding is satisfactory if we think of the coordinates of X as 
corresponding to labelled points on a manifold, as is reasonable in (say) geophysical applications, where 
X is the state of some variable such as pressure, at a grid of points on the globe at some time. But it 
is not relevant if the labels are meaningless as in microarrays genomics, where the coordinates simply 
label genes. However, it should now be clear that our second generalization to generalized banding up 
to some unknown permutation tt deals with such situations. It enables us to define classes of matrices E 
in terms of their dependency graph defined in terms of E^^ by having vertices correspond to coordinates 
of X with an edge between i and j if the (i,j)th entry of E~^ is different from 0. However, although 
the determination of g is dictated by the situation, estimation of tt is nontrivial and will be pursued 
elsewhere. 
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